CHAPTER 16 Getting Straight Talk on Straight-Line Regression 229

With the output shown in Figure 16-4, where the intercept (a) is 76.9 and the

slope (b) is 0.487, you can write the equation of the fitted straight line like this:

SBP = 76.9 + 0.487 Weight.

Then you can use this equation to predict someone’s SBP if you know their weight.

So, if a person weighs 100 kilograms, you can estimate that that person’s SBP will

be around 76 9

100

0 487

.

.

, which is 76 9

48 7

.

.

, or about 125.6 mmHg. Your

prediction probably won’t be exactly on the nose, but it should be better than not

using a predictive model and just guessing.

How far off will your prediction be? The residual SE provides a unit of measure-

ment to answer this question. As we explain in the earlier section “Summary sta-

tistics for the residuals,” the residual SE indicates how much the individual points

tend to scatter above and below the fitted line. For the SBP example, this number

is 9 8. , so you can expect your prediction to be within about 10 mmHg most of

the time.

Recognizing What Can Go Wrong with

Straight-Line Regression

Fitting a straight line to a set of data is a relatively simple task, but you still have

to be careful. A computer program does whatever you tell it to, even if it’s some-

thing you shouldn’t do.

Those new to straight-line regression may slip up in the following ways:»

» Fitting a straight line to curved data: Examining the pattern of residuals in

the residuals versus fitted chart in Figure 16-5 can let you know if you have

this problem.»

» Ignoring outliers in the data: Outliers — especially those in the corners of a

scatterplot like the one in Figure 16-3 — can mess up all the classical statistical

analyses, and regression is no exception. One or two data points that are way

off the main trend of the points will drag the fitted line away from the other

points. That’s because the strength with which each point tugs at the fitted

line is proportionate to the square of its distance from the line, and outliers

have a lot of distance, so they have a strong influence.

Always look at a scatter plot of your data to make sure outliers aren’t present.

Examine the residuals to ensure they are distributed normally above and

below the fitted line.